Recovering dialect geography from an unaligned comparable corpus
نویسنده
چکیده
This paper proposes a simple metric of dialect distance, based on the ratio between identical word pairs and cognate word pairs occurring in two texts. Different variations of this metric are tested on a corpus containing comparable texts from different Swiss German dialects and evaluated on the basis of spatial autocorrelation measures. The visualization of the results as cluster dendrograms shows that closely related dialects are reliably clustered together, while multidimensional scaling produces graphs that show high agreement with the geographic localization of the original texts.
منابع مشابه
Corpus-based Dialectometry: Aggregate Morphosyntactic Variability in British English Dialects
The research reported in this paper departs from most previous work in dialectometry in several ways. Empirically, it draws on frequency vectors derived from naturalistic corpus data and not on discrete atlas classifications. Linguistically, it is concerned with morphosyntactic (as opposed to lexical or pronunciational) variability. Methodologically, it marries the careful analysis of dialect p...
متن کاملGlobalization, Standardization, and Dialect Leveling in Iran
This paper is an attempt to shed light on the effects of modernization, urbanization, monolingual educational system, and mass media as well as the process of globalization on dialect leveling among Persian dialects. In so doing, the first part of the paper elaborates on the relationship between globalization and sociolinguistics, and on the concept of standardization. Also, it discusses some ...
متن کاملEffective Factors on Naming Practices in Iran: Sociopolitics or Dialect?
Naming as an inseparable sign of a country’s language has attracted the attention of many linguists to formulate and test hypotheses regarding the culture and language of the people of a certain area. Iran appears like a proper destination for conducting a research focusing on naming based on several factors such as geography or chronology. The present article aims to take a specific look at th...
متن کاملACTIV-ES: a comparable, cross-dialect corpus of 'everyday' Spanish from Argentina, Mexico, and Spain
Corpus resources for Spanish have proved invaluable for a number of applications in a wide variety of fields. However, a majority of resources are based on formal, written language and/or are not built to model language variation between varieties of the Spanish language, despite the fact that most language in ‘everyday’ use is informal/ dialogue-based and shows rich regional variation. This pa...
متن کاملTitle 1 Visualization as a Research Tool for Dialect Geography Using a Geo-browser 2
22 Moving from a traditional dialect geography research methodology to one in which data are processed 23 electronically and where visualization is used as a research tool can be of great benefit to dialect geography. 24 A working environment offering full support for using visualization as a research tool could take dialect 25 geography into the era of eScience. Despite the advent of electroni...
متن کامل